85 research outputs found

    Multilevel Hierarchical Kernel Spectral Clustering for Real-Life Large Scale Complex Networks

    Full text link
    Kernel spectral clustering corresponds to a weighted kernel principal component analysis problem in a constrained optimization framework. The primal formulation leads to an eigen-decomposition of a centered Laplacian matrix at the dual level. The dual formulation allows to build a model on a representative subgraph of the large scale network in the training phase and the model parameters are estimated in the validation stage. The KSC model has a powerful out-of-sample extension property which allows cluster affiliation for the unseen nodes of the big data network. In this paper we exploit the structure of the projections in the eigenspace during the validation stage to automatically determine a set of increasing distance thresholds. We use these distance thresholds in the test phase to obtain multiple levels of hierarchy for the large scale network. The hierarchical structure in the network is determined in a bottom-up fashion. We empirically showcase that real-world networks have multilevel hierarchical organization which cannot be detected efficiently by several state-of-the-art large scale hierarchical community detection techniques like the Louvain, OSLOM and Infomap methods. We show a major advantage our proposed approach i.e. the ability to locate good quality clusters at both the coarser and finer levels of hierarchy using internal cluster quality metrics on 7 real-life networks.Comment: PLOS ONE, Vol 9, Issue 6, June 201

    Kernel Spectral Clustering and applications

    Full text link
    In this chapter we review the main literature related to kernel spectral clustering (KSC), an approach to clustering cast within a kernel-based optimization setting. KSC represents a least-squares support vector machine based formulation of spectral clustering described by a weighted kernel PCA objective. Just as in the classifier case, the binary clustering model is expressed by a hyperplane in a high dimensional space induced by a kernel. In addition, the multi-way clustering can be obtained by combining a set of binary decision functions via an Error Correcting Output Codes (ECOC) encoding scheme. Because of its model-based nature, the KSC method encompasses three main steps: training, validation, testing. In the validation stage model selection is performed to obtain tuning parameters, like the number of clusters present in the data. This is a major advantage compared to classical spectral clustering where the determination of the clustering parameters is unclear and relies on heuristics. Once a KSC model is trained on a small subset of the entire data, it is able to generalize well to unseen test points. Beyond the basic formulation, sparse KSC algorithms based on the Incomplete Cholesky Decomposition (ICD) and L0L_0, L1,L0+L1L_1, L_0 + L_1, Group Lasso regularization are reviewed. In that respect, we show how it is possible to handle large scale data. Also, two possible ways to perform hierarchical clustering and a soft clustering method are presented. Finally, real-world applications such as image segmentation, power load time-series clustering, document clustering and big data learning are considered.Comment: chapter contribution to the book "Unsupervised Learning Algorithms

    Academic dishonesty among Italian nursing students: A longitudinal study

    Get PDF
    Considering the ethical issues related to nursing and that Ethics is an integral part of the nursing education in the degree course, one would suppose that academic dishonesty might be less frequent in nursing students than in students of other disciplines. However, several studies show that this trend of deceitful behaviour seems to be similar among the university nursing students and those of other disciplines. The aim of this study is to investigate the phenomenon of academic dishonesty in the classroom from a longitudinal perspective within a cohort of Italian nursing students. A non-experimental longitudinal design was used. All nursing students were recruited from the Nursing Science Bachelor Degree Program of a big Italian university in the centre of Italy and participants were part of an ongoing longitudinal research project which started in 2011 on nursing students' wellbeing. The results show that students get accustomed to taking academically deceitful actions. They come to consider their behaviours acceptable and normal, thereby stabilizing them, which increases the probability of stabilizing subsequent deceitful behaviours. The stability through time of academic cheating behaviours committed during higher education, within the study's timeframe, provides important perspectives into the establishment of rigorous standards of ethical and moral behaviours by the student

    Colorectal Cancer Stage at Diagnosis Before vs During the COVID-19 Pandemic in Italy

    Get PDF
    IMPORTANCE Delays in screening programs and the reluctance of patients to seek medical attention because of the outbreak of SARS-CoV-2 could be associated with the risk of more advanced colorectal cancers at diagnosis. OBJECTIVE To evaluate whether the SARS-CoV-2 pandemic was associated with more advanced oncologic stage and change in clinical presentation for patients with colorectal cancer. DESIGN, SETTING, AND PARTICIPANTS This retrospective, multicenter cohort study included all 17 938 adult patients who underwent surgery for colorectal cancer from March 1, 2020, to December 31, 2021 (pandemic period), and from January 1, 2018, to February 29, 2020 (prepandemic period), in 81 participating centers in Italy, including tertiary centers and community hospitals. Follow-up was 30 days from surgery. EXPOSURES Any type of surgical procedure for colorectal cancer, including explorative surgery, palliative procedures, and atypical or segmental resections. MAIN OUTCOMES AND MEASURES The primary outcome was advanced stage of colorectal cancer at diagnosis. Secondary outcomes were distant metastasis, T4 stage, aggressive biology (defined as cancer with at least 1 of the following characteristics: signet ring cells, mucinous tumor, budding, lymphovascular invasion, perineural invasion, and lymphangitis), stenotic lesion, emergency surgery, and palliative surgery. The independent association between the pandemic period and the outcomes was assessed using multivariate random-effects logistic regression, with hospital as the cluster variable. RESULTS A total of 17 938 patients (10 007 men [55.8%]; mean [SD] age, 70.6 [12.2] years) underwent surgery for colorectal cancer: 7796 (43.5%) during the pandemic period and 10 142 (56.5%) during the prepandemic period. Logistic regression indicated that the pandemic period was significantly associated with an increased rate of advanced-stage colorectal cancer (odds ratio [OR], 1.07; 95%CI, 1.01-1.13; P = .03), aggressive biology (OR, 1.32; 95%CI, 1.15-1.53; P < .001), and stenotic lesions (OR, 1.15; 95%CI, 1.01-1.31; P = .03). CONCLUSIONS AND RELEVANCE This cohort study suggests a significant association between the SARS-CoV-2 pandemic and the risk of a more advanced oncologic stage at diagnosis among patients undergoing surgery for colorectal cancer and might indicate a potential reduction of survival for these patients

    Entropy-Based Incomplete Cholesky Decomposition for a Scalable Spectral Clustering Algorithm: Computational Studies and Sensitivity Analysis

    No full text
    Spectral clustering methods allow datasets to be partitioned into clusters by mapping the input datapoints into the space spanned by the eigenvectors of the Laplacian matrix. In this article, we make use of the incomplete Cholesky decomposition (ICD) to construct an approximation of the graph Laplacian and reduce the size of the related eigenvalue problem from N to m, with m ≪ N . In particular, we introduce a new stopping criterion based on normalized mutual information between consecutive partitions, which terminates the ICD when the change in the cluster assignments is below a given threshold. Compared with existing ICD-based spectral clustering approaches, the proposed method allows the reduction of the number m of selected pivots (i.e., to obtain a sparser model) and at the same time, to maintain high clustering quality. The method scales linearly with respect to the number of input datapoints N and has low memory requirements, because only matrices of size N × m and m × m are calculated (in contrast to standard spectral clustering, where the construction of the full N × N similarity matrix is needed). Furthermore, we show that the number of clusters can be reliably selected based on the gap heuristics computed using just a small matrix R of size m × m instead of the entire graph Laplacian. The effectiveness of the proposed algorithm is tested on several datasets

    Clustering Evolving Data using Kernel-Based Methods (Clusteren van evoluerende data met behulp van kernel-gebaseerde methodes)

    No full text
    Thanks to recent developments of Information Technologies, there is a profusion of available data in a wide range of application domains ranging from science and engineering to biology and business. For this reason, the demand for real-time data processing, mining and analysis is experiencing an explosive growth in recent years. Since labels are usually not available and in general a full understanding of the data is missing, clustering plays a major role in shedding an initial light. In this context, elements such as generalization to out-of-sample data, model selection criteria, consistency of the clustering results over time and scalability to large data become key issues. A successful modelling framework is offered by Least Squares Support Vector Machine (LS-SVM), which is designed in a primal-dual optimization setting. The latter allows extensions of the core models by adding additional constraints to the primal problem, by changing the objective function or by introducing new model selection criteria. In this thesis, we propose several modelling strategies to tackle evolving data in different contexts. In the framework of static clustering, we start by introducing a soft kernel spectral clustering (SKSC) algorithm, which can better deal with overlapping clusters with respect to kernel spectral clustering (KSC) and provides more interpretable outcomes. Afterwards, a whole strategy based upon KSC for community detection of static networks is proposed, where the extraction of a high quality training sub-graph, the choice of the kernel function, the model selection and the applicability to large-scale data are key aspects. This paves the way for the development of a novel clustering algorithm for the analysis of evolving networks called kernel spectral clustering with memory effect (MKSC), where the temporal smoothness between clustering results in successive time steps is incorporated at the level of the primal optimization problem, by properly modifying the KSC formulation. Later on, an application of KSC to fault detection of an industrial machine is presented. Here, a smart pre-processing of the data by means of a proper windowing operation is necessary to catch the ongoing degradation process affecting the machine. In this way, in a genuinely unsupervised manner, it is possible to raise an early warning when necessary, in an online fashion. Finally, we propose a new algorithm called incremental kernel spectral clustering (IKSC) for online learning of non-stationary data. This ambitious challenge is faced by taking advantage of the out-of-sample property of kernel spectral clustering (KSC) to adapt the initial model, in order to tackle merging, splitting or drifting of clusters across time. Real-world applications considered in this thesis include image segmentation, time-series clustering, community detection of static and evolving networks.status: publishe

    Supervised aggregated feature learning for multiple instance classification

    No full text
    © 2016 Elsevier Inc. This paper introduces a novel algorithm, called Supervised Aggregated FEature learning or SAFE, which combines both (local) instance level and (global) bag level information in a joint framework to address the multiple instance classification task. In this realm, the collective assumption is used to express the relationship between the instance labels and the bag labels, by means of taking the sum as aggregation rule. The proposed model is formulated within a least squares support vector machine setting, where an unsupervised core model (either kernel PCA or kernel spectral clustering) at the instance level is combined with a classification loss function at the bag level. The corresponding dual problem consists of solving a linear system, and the bag classifier is obtained by aggregating the instance scores. Synthetic experiments suggest that SAFE is advantageous when the instances from both positive and negative bags can be naturally grouped in the same cluster. Moreover, real-life experiments indicate that SAFE is competitive with the best state-of-the-art methods.publisher: Elsevier articletitle: Supervised aggregated feature learning for multiple instance classification journaltitle: Information Sciences articlelink: http://dx.doi.org/10.1016/j.ins.2016.09.060 content_type: article copyright: © 2016 Elsevier Inc. All rights reserved.status: publishe

    Efficient multiple scale kernel classifiers

    No full text
    © 2016 IEEE. While kernel methods using a single Gaussian kernel have proven to be very successful for nonlinear classification, in case of learning problems with a more complex underlying structure it is often desirable to use a linear combination of kernels with different widths. To address this issue, this paper presents a classification algorithm based on a jointly convex constrained optimization formulation. The primal problem is defined as jointly learning a combination of kernel classification models formulated in different feature spaces, which account for various representations or scales. The solution can be found by either solving a system of linear equations in case of equal combination weights or by means of a block coordinate descent scheme. The dual model is represented by a classifier using multiple kernels in the decision function. Furthermore, time and space complexity are reduced by adopting a divide and conquer strategy and through the use of the Nyström approximation of the eigenfunctions. Several experiments show the effectiveness of the proposed algorithms in dealing with datasets containing up to millions of instances.status: publishe

    Fast Kernel Spectral Clustering

    No full text
    © 2017 Spectral clustering suffers from a scalability problem in both memory usage and computational time when the number of data instances N is large. To solve this issue, we present a fast spectral clustering algorithm able to effectively handle millions of datapoints at a desktop PC scale. The proposed technique relies on a kernel-based formulation of the spectral clustering problem, also known as kernel spectral clustering. In this framework, the Nyström approximation of the feature map of size m, with m ≪ N, is used to solve the primal optimization problem. This leads to a reduction of time complexity from O(N3) to O(mN) and space complexity from O(N2) to O(mN). The effectiveness of the proposed algorithm in terms of computational efficiency and clustering quality is illustrated on several datasets.software : link www.kuleuven.Bestatus: publishe
    • …
    corecore